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SCREENING INTERACTOR MOLECULES WITH WHOLE GENOME OLIGONUCLEOTIDE OR POLYNUCLEOTIDE 
ARRAYS 



(57) Abstract 

This invention relates to methods for the identification of nucleic 
acids by direct hybridization to high-density oligonucleotide arrays. The 
methods of this invention comprise the steps of: (a) screening a DNA library, 
such as an S, cerevisiae genomic DNA library, by performing a double 
hybrid screening method with a recombinant vector containing a DNA insert 
encoding a candidate protein of interest and then selecting the clones from 
the DNA library that code for proteins that interact with the candidate protein 
of interest; and (b) hybridizing the DNA inserts contained in the clones that 
have been selected in step (a) using an oligonucleotide probe matrix wherein 
the probe locations on the host genome cover all of the coding sequences, 
determining the hybridization location and consequently, the gene coding 
for a specific protein that interacts with the candidate protein of interest in 
the double hybrid screening system. This invention is also directed to the 
polynucleotides obtained by the methods of this invention, the polypeptides 
encoded by those polynucleotides and the DNA arrays utilized in the methods 
of this invention. 
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SCREENING INTERACTOR MOLECULES WITH WHOLE 
GENOME OLIGONUCLEOTIIDE OR POLYNUCLEOTIDE ARRAYS 

BACKGROUND OF THE INVENTION 
5 An estimated 6,000 genes were identified upon ihc completion of sequencing the 

Saccharorityccs carcvJsiac genome. Fewer than half of these genes have a known biological 
function (1,2). Understanding how these newly sequenced genes function in both defined 
and emerging biochemical pathways is a major challenge for researchers in the post-genome 
era. Efficient functional characterization of these genes requires strategies for scaling genetic 

1 0 analyses to the whole genome level (3). Determination of mRNA gene expression patterns, 
disruption phenotypes, and protein-protein interactions are key questions, which need to be 
addressed for every gene in a genome. 

Plasmid-based library selections are an established approach to the functional analysis 
of uncharacterized genes, and can help elucidate biological ftinction by identifying, for 

1 5 example, physical interactors for a gene and genetic enhancers and suppressors of mutant 
phenotypes. However, the application of these selections to every gene in a eukaryotic 
genome involves the need to manipulate and sequence hundreds of DNA plasmids. Thus, 
applying traditional methods of functional analysis to every gene in a genome is limited by 
labor and cost. 

20 Because the discovery of thousands of uncharacterized genes by genome sequencing 

projects has increased the need for methods of large scale fijnctional analysis, several 
approaches have been initiated to identify genes that, when disrupted or removed, lead to 
selective growth disadvantages (14-16). A promising complementary approach is the 
application of established genetic screens to every gene in an organism in an attempt to 

25 assign a biological function to every open reading frame. Genome- wide analyses based on 
two-hybrid screens, enhanced synthetic lethal screens, and screens for signal peptide 
sequences have been proposed (17-19). 

The two hybrid assay exploits the ability of a pair of interacting proteins to bring a 
transcription activation domain into close proximity with a DNA-binding site that regulates 

30 the expression of an adjacent reporter gene. The assay employs chimeric genes which express 
two types of hybrid proteins. The second hybrid contains the DNA binding domain of a 
transcriptional activator fused to a second test protein. The first hybrid protein contains a 
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transcriptionaJ activation dc~ain fused to a first test protein. If tiie two test proteins are abic 
to interact, they bring the two domains of the transcriptional activator into close proximity 
sufficient to cause transcription, which can then be detected by the activity of a marker gene 
that contains a binding site for the DNA-binding domain. 

The two-hybrid assay can be used to test a multiplicity of proteins simultaneously to 
determine whether they interact with a known protein. For example, a DNA fragment 
encoding the DNA-binding domain may be fused to a DNA fragment encoding the known 
protein in order to provide one hybrid. This hybrid is imroduced into the cells carrying a 
marker gene. For the first hybrid, a library of plasmids can be constructed which may include, 
for example, total mammalian cDNA fused to the DNA sequence encoding the activation 
domain. This library is introduced imo the cells carrying the second hybrid. If any individual 
plasmid firom the library encodes a protein that is capable of interacting with the known 
protein, a positive signal will be obtained. However, because repetitive dideoxy sequencing 
is required to exhaustively identify the results of a screen, application of these methods to 
1 5 tens of thousands of genes is also limited by time, labor, and expense. 

Two-hybrid screens for protein-protein interactions provide a genetic tool that can 
be applied, in principle, to every gene in a genome. The Escherichia coli bacteriophage T7 
genome has already been characterized with exhaustive two-hybrid screening and sequencing 
for each known gene. Even with the use of novel strategies for highly efficient two-hybrid 
20 screening, however, an analysis of all genes encoded in the human genome would require 
sequencing of approximately 1 x 10^ sequence fragments. As an alternative, genes may be 
individually cloned into two-hybrid vectors and tested in a pairwise manner. One 
disadvantage of this approach is that testing only the full length form of a gene might fail to 
identify those interactions that occur only with isolated domains of a protein (20). Functional 
25 selections that need to be performed in mammalian cells would also benefit fi-om more highly 
parallel analysis. For example, it is conceivable to select for human genes that yield 
phenotypes, such as increased drug or pathogen resistance, when overexpressed in cell lines. 
The use of array hybridization to analyze results from these screens would eliminate the need 
to maintain large numbers of individual clones in tissue culture until they can be sequenced. 
30 Thus, the present invention overcomes the problems associated with the prior art through 
the use of DNA arrays or matrices, permitting highly parallel identification of the sequence 
and orientation of nucleic acid elements in a pool. 
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SUMMARY OF THE r>JVENTION 

The methods of this invention comprise the steps of; (a) screening a DNA library, 
such as an S. cerevisiae genomic DNA iibraty. by performing a double hybrid method with 
a recombinant vector containing a DNA insert encoding a candidntc protein of interest and 
then selecting the clones from the DNA library ii,at code for proteins that interact with the 
candidate protein of interest; and (b) hybridizing the DNA inserts contained in the clones that 
have been selected in step (a) using an oligonucleotide probe matrix, wherein the probe 
locations on the host genome cover all of the coding sequences, determining the 
hybridization location and consequently, the gene coding for a specific protein that interacts 
with the candidate protein of interest in the double hybrid screening system. Thus, the 
methods of this invention allow screening at a very large scale for DNA sequences having 
functional utility and avoid the systematic sequencing of the DNA inserts of interest required 
by prior art methods. 

This invention is also directed to the polynucleotides obtained by the methods of this 
invention and the polypeptides encoded by those polynucleotides. In addition, the invention 
is directed to the DNA arrays or matrices utilized in the methods of this invention. 

Oligonucleotide arrays can be synthesized for any organism for which complete or 
partial sequence informaUon is available. The time required to analyze the results of a genetic 
selection can be drastically reduced, malcing it feasible to apply conventional screens to very 
large numbers of genes in a mammalian genome. Analysis of screens by array hybridization 
is adaptable to any genome-wide functional selection or experiment where the output is a set 
of nucleic acid sequences. 

For example. DNA arrays containing oligonucleotides complementary to every gene 
in the Saccharomyces cerevisiae genome can be used to analyze the results from plasmid 
based genetic screens in a single experiment. Based on the recently completed sequence of 
Saccharomyces cerevisiae, the first high density arrays containing oligonucleotides 
complementary to every gene in the yeast genome have been designed and synthesized. 
Two-hybrid protein-protein interaction screens were carried out for Saccharomyces 
cerevisiae genes implicated in mRNA splicing and microtubule assembly. Hybridization of 
labeled DNA derived from positive clones is sufficient to characterize the results of a screen 
in a single experiment allowing rapid detection of both established and novel biological 
interaaions. These results demonstrate the use of oligonucleotide arrays for the analysis of 
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two-hybrid screens. This approach is generally applicable to the analysis ofa range of genetic 
selections with outputs of high complexity. 
BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 represents a metliod for identifying sequences following a genetic selection. 
Rather than individual Durincaiion and didcoxyscqucncmg. all cioncs arc poo.cd from plates, 
and plasmid DNA is isolated in a single purification PCR amplification using primers with 
3- sequence corresponding to the vector sequence is used to selectively ennch for insen DNA 
from the plasmid pool. Amplified insert DNA is fragmented with DNAse I, labeled with 
biotin-ddATP. and hybridized to an array containing oligonucleotide probes for every gene 
in the yeast genome. 

Figures 2a and 2b depict fluorescence images ofa high-density oligonucleotide 
array containing 25-mer probes for nearly every gene on Saccharomyces cerevisiae 
chromosomes 5 through 10. Fig. 2n depicts the fluorescence pattern obtained following 
hybridization of 1 1 control genes: YEL002c. YEL003w. YELOOSc. YEL006w, YELOlSw. 
YEL019C. YEL021W. YEL024w. YHLOMc. YHL045w. and YHL044c. Dark areas 
correspond to probes for genes not present in the control pool. Fig. 2b provides a close-up 
view of gene YHLOMc. which show the exact probe features that hybridize to the insen. 
Red grid highlights all probe features for YHLOMc. The top row of probe elements contain 
oligonucleotides perfectly complementary to gene sequence, while bottom rows contain a 
mismatch in the central position of the oligonucleotide. Approximate locations of 
complementary oligonucleotide probes along the YHLOMc ORF are also shown. 

Figure 3 depicts a fluorescence image ofa portion ofa high-density oligonucleotide 
array containing 25.mer probes to nearly every gene on Saccharomyces cerevisiac 
chromosomes 5 through 10 following hybridization of YMRl 1 7c two-hybrid sample. The 
25 three lighted strips correspond to probes covering nucleotides 1 56-654 of ORF YERO 1 8c, 
nucleotides 1860-2484 ofYER032w. and nucleotides 4092-4452 of YGL197w. Terminal 
probes are described as the most 5' nucleotide of the most 5' probe and the most 3' 
nucleotide of the most 3' probes that gave a positive signal. Dark areas correspond to probes 
for genes not present following genetic selection. 
30 DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides methods for screening polynucleotides, such as 
polynucleotides contained in the genome or in a cDNA obtained from the mRNA ofa given 
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prokar.o:ic or eukaryotic host or in a DNA insert of a random peptide DNA library. In 
essence, the methods of this invention comprise the steps of: a) subjecting the polynucleotide 
of interest to a two-hybrid screening method; and b) subjecting the polynucleotides selected 
at step a) to a hybridization reaction onto a matrix substrate onio which oiigonucieotide or 
polynucleotiac probes have been immobilized (i.e., DNA array). 

Any two-hybrid screening method may be used to complete step a) of the methods 
of tins mvention. For example, the yeast two hybrid system developed by Fields and 
coworkers (21) utilizes hybrid genes to detect protein-protein interactions by means of direct 
activation of a reporter-gene expression. U.S. Patents Nos. 5.283.173 and 5.468.614 
describing this technique are relied upon and incorporated by reference. Mammalian two 
hybrid systems using 3-galactosidase complementation to monitor protein-protein 
interactions in intact eukaryotic cells (22. 23). phage display (24) and double tagging assays 
(25) represent alternative two-hybrid assay approaches to screen complex libraries of 
proteins for direct interaaion with a given ligand. In addiUon, reverse two hybrid screening 
1 5 procedures, such as those described by White (26) and Vidal et al. (27, 28) can be utilized 
in the methods of this invention. Most preferably, the two-hybrid system utilized in the 
methods of this invention is that described by Daniel Ladant et al. in U.S. provisional patent 
application No. 60067308 entitled A BACTERIAL MULTI-HYBRID SYSTEM AND 
APPLICATIONS THEREOF, filed December 4. 1997. the emire disclosure of which is 
20 relied upon and incorporated herein by reference. 

The preparation and use of high density DNA arrays has been described in 
International patent applications WO 97/29212, WO 97/27317, WO 97/10365. and WO 
92/10588. the disclosures of which are relied upon and incorporated herein by reference. 
See also. Wodicka. L. etal. {\991) Nature Biotechnology. 15. 1359-1367. 
^5 One embodiment of this invention (designated "Mr.nc d : ' for convenience) provides 

a method for selecting a polynucleotide encoding a first polypeptide that is able to interact 
with a second polypeptide of interest. Specifically, this method comprises the foUownng 
steps: 

a) providing a recombinant host cell containing a detectable gene, wherein the 
30 detectable gene expresses a detectable polypeptide when the detectable gene is activated by 
an amino acid sequence including a transcriptional activation domain, such as the 
transcription activation domain of the GAL4 protein; 
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b) providing a first chimenc gene :ha: is capaoie of being expressed in the host eel!, 
the first chimeric gene comprising a DNA sequence that encodes a first hybrid polypeptide 
encoded by a given prokaryotic or eukaryotic organism, said first hybrid polypeptide 
comprising: 

5 (i) the transcriptional activation domain; and 

(ii) a first test polypeptide that is to be tested for interaction with the second 
test polypeptide; 

c) providing a second chimeric gene that is capable of being expressed in the host 
cell, the second chimeric gene comprising a DNA sequence that encodes a second hybrid 

10 polypeptide, the second hybrid polypeptide comprising: 

(i) a DNA-binding domain, e.g., the DNA binding domain of the GAL4 
protein, that recognizes a binding site on the detectable gene in the host 
cell; and 

(ii) a second test polypeptide that is to be tested for interaction with at least 
^ 5 one first test polypeptide; 

wherein interaaion between the first test polypeptide and the second test polypeptide 
in the host cell causes the transcriptional activation domain to activate transcription of the 
detectable gene; 

d) introducing the first chimeric gene and the second chimeric gene into the host 

20 cell; 

e) subjecting the host cell to condiUons under which the first hybrid polypeptide and 
the second hybrid polypeptide are expressed in sufficient quantity for the detectable gene to 
be activated; 

0 seleaing the host ceU clones for which the detectable gene has been expressed to 
a degree greater tnan expression in ;ne aosence of inieraction oeivvssn tne firs: :est 
polypeptide and the second test polypeptide; 

g) optionally pooling the clones that have been positively selected at step f) 

h) amplifying the polynucleotides of interest contained in the clones of step 0 or g) 
with a pair of oligonucleotide primers respectively hybridizing with a plasmid sequence 
located at the 5' end of the polynucleotide of interest and with a sequence complementary 
to a plasmid sequence located at the 3" end of the polynucleotide of interest coding for the 
first polypeptide; 



^2 
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i) hybridizing the amplified poiynucieotides obtained at step h) to a matnx substrate 
on which has been bound, at known locations, a plurality of sets of oligonucleotide or 
polynucleotide probes of predetermined sequence, each bound set of oligonucleotide or 
polynucleotide probes being able to hybridize with a specific polynucleotide carried by the 
genome of the organism from which the polynucleotide coding for the first lest polypeptide 
belongs; 

j) detecting the locations of the polynucleotide hybrid complexes obtained at step 
i) on the matrix substrate; and 

k) optionally determining the quantity of each hybrid complex detected at step j). 

Most preferably, the second chimeric gene is provided to the recombinant cell host 
before the introduction of the first chimeric gene. 

An alternate embodiment of the invention (designated "Method 2" for convenience) 
provides a method for selecting a polynucleotide encoding a first polypeptide that inhibits 
the interaction between a second polypeptide and a third polypeptide. Specifically, this 
method comprises the following steps: 

a) providing a recombinant host cell containing a detectable gene, wherein the 
detectable gene expresses a detectable polypeptide when the detectable gene is activated by 
an amino acid sequence including a transcriptional activation domain, e.g.. GAL4; 

b) providing a first gene that is capable of being expressed in the host cell, said first 
gene comprising a DNA sequence that encodes a first polypeptide encoded by a given 
prokaryotic or eukaiyotic organism, and for which its inhibition property on the interaction 
between a second and a third polypeptide is tested; 

c) providing a second chimeric gene that is capable of being expressed in host cell, 
the second chimeric gene comprising a DNA sequence that encodes a second hybrid 
polypeptide encoded by a given prokaryotic or eukaryotic organism, said second hybrid 
polypeptide comprising: 

(i) the transcriptional activation domain; and 

(ii) a second test polypeptide that interacts with a third polypeptide; 

d) providing a third chimeric gene that is capable of being expressed in the host cell, 
the third chimeric gene comprising a DNA sequence that encodes a third hybrid polypeptide, 
the third hybrid polypeptide comprising: 
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(i) a DKA-bincing domain, sucn as GAL4, that recognizes a binding site on 
the detectable gene in the host cell; and 

(ii) a third test polypeptide that interacts with the second test polypeptide; 
wherein interaction between the second test polypeptide and the third test 

polypeptide in the host cell causes the transcriptional activation domain to activate 
transcription of the detectable gene; 

e) introducing the first gene, the second chimeric gene, and the third chimeric gene 
into the host cell; 

0 subjecting the host cell to conditions under which the second hybrid polypeptide 
and the third polypeptide are expressed in sufficient quantity for the detectable gene to be 
activated; 

g) selecting the host cell clones for which the detectable gene has been expressed to 
a degree lesser than its expression level in the absence of expression of the first polypepUde; 

h) optionally pooling the clones that have been positively selected at step g); 

i) amplifying the polynucleotides of interest contained in the clones of step g) or h) 
with a pair of oligonucleotide primers respectively hybridizing vnth a plasmid sequence 
located at the 5' end of the polynucleotide of interest and with a sequence complementaiy 
to a plasmid sequence located at the 3' end of the polynucleotide of imerest coding for the 
first polypeptide; 

j) hybridizing the amplified polynucleotides obtained at step i) to a matrix substrate 
on which has been bound, at known locations, a plurality of sets of oligonucleotide or 
polynucleotide probes of predetermined sequence, each bound set of oligonucleotide or 
polynucleotide probes being able to hybridize with a specific polynucleotide carried by the 
genome of the organism fi-om which the polynucleotide coding for the first test polypeptide 
25 belongs; 

k) detecting the locations of the polynucleotide hybrid complexes obtained at step 
j) on the matrix substrate; 

I) optionally determining the quantity of each hybrid complex detected at step i). 
Most preferably, the second and the third chimeric genes are provided to the 
30 recombinant cell host before the introduction of the first chimeric gene. 

In Method 2 of the present invention, the first chimeric gene is preferably expressed 
under the control of an inducible promoter. Thus, the recombinant cell host that has been 
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transformed with the three chimeric genes firs: expresses ccnsiitutively tne second and the 
third chimeric gene in order to allow the interaction of the resulting second and third fusion 
polypeptides to take place. Then the expression of the first chimeric gene is induced using 
the appropriate inducing signal, such as the addition of an inducer molecule in the culture 
5 medium. For example, the inducible promoter Met 3E (inaucibic by the amino ncid 
methionine) (29) may be used to control the expression of the first chimeric gene. 

For the purpose of describing this invention, a gene or a chimeric gene means a 
polynucleotide that encodes a polypeptide or a fusion polypeptide respectively, wherein the 
polynucleotide may or may not additionally include a polynucleotide sequence that drives its 
1 0 expression at the transcriptional or translational level. 

In a preferred embodiment of the methods of this invention, some of the 
polynucleotides obtained at step 0 or g) of Method I or step g) or h) of Method 2 are 
(simultaneously with completion of the remaining steps in each method with the remaining 
polynucleotides) subjected to a DNA amplification reaction with a pair of primers, wherein 
1 5 at least one of the primers comprises, at its 5' end. a promoter region recognized by a specific 
RNA polymerase (e.g., the bacteriophage T7 promotor region) and then incubated in the 
presence of the corresponding RNA polymerase, such as the bacteriophage T7 polymerase, 
in an acellular enzyme medium. The mRNA is then further incubated in the presence of a 
reverse transcriptase type enzyme and the resulting cDNA molecule is hybridized to a matrix 
substrate on which has been bound, at known locations, a plurality of sets of oligonucleotides 
or polynucleotides of predetermined sequence, each bound set of oligonucleotides being able 
to hybridize with a specific polynucleotide carried by the genome of the organism fi-om which 
the polynucleotide coding for the first test polypeptide belongs. The polynucleotide hybrid 
complexes obtained on the matrix substrate are then detected and compared with the results 
25 obtained from the matrix of Method 1 or Method 2. 

It will be noted in the practice of the methods of this invention, that the 
polynucleotide inserts of the DNA library used to make the two-hybrid screening step may 
begin with a nucleotide which is not in phase with the transcriptional activation domain 
coding sequences. Despite the open reading frame shift occurring at the 5' end of the 
30 polynucleotide sequence, it has been observed that a correct polypeptide is synthesized, due 
to a probable jump of the ribosome. placing the ribosome back in the correct reading frame. 



20 



wo 99/3S256 



PCT/IB99/00048 



10 



Consequently, a shift in the reading frame at the beginning of the coding sequence of interest 
does not prevent the synthesis of the correct polypeptide interactor. 

In a most preferred embodiment of the methods according to this invention, the 
selected polynucleotides encoding the first polypeptide arc labeled before performing tiic 
hybridization step, either during or after the PCR amplification step. The polynucleotide may 
be labeled with a radioactive element ("P. "S. 'H. '"l) or by a non-isotopic molecule (for 
example, biotin, acetylaminofluorene. digoxigenin, 5-bromodesoxyuridin. nuorescein), 
Examples of non-radioactive labeling of nucleic acid fragments are described in French 
Patent No. 78 10975 or Uredea, or Sanchez-Pescador et al. (30. 31). One of skill in the art 
will appreciate that other labeling techniques may also be used, such as those described in 
French Patents Nos. 2 422 956 and 2 528 755 or in Matthews et al. (32). 

One of the most important features of the hybridized DNA arrays or matrices utilized 
in the screening methods of this invention is that the DNA arrays allow, in a one step 
method, mapping of all the potential polypeptides interacting with a given defined 
polypeptide in a forward two-hybrid method, or inhibiting the interaction between two 
defined polypeptides in a reverse two hybrid method. Thus, the hybridization pattern of 
oligo- or polynucleotides coding for the interactor polypeptides identify the whole set of 
polypeptides of interest. In contrast, the prior art technique of systematic sequencing of 
every selected polynucleotide idenufied only individual interactor coding sequences and did 
not provide any understanding of the global interaction possibilities. 

Preferably, the oligonucleotide or polynucleotide probes bound to the substrate 
matrix in the methods of this invention are designed in such a manner that every region of 
the whole genome of the prokaryotic or eukaryotic host organism is able to specifically 
hybridize to at least one set of the oligonucleotide or polynucleotide probes. It is also 
preferred that sets of oligonucleotide or polynucleotide probes bound to the matrix substrate 
are complementary to adjacent sequences in the genome of the prokaryotic or eukaryotic 
host, such that the distance between the sequences is less than one kilobase, preferably less 
than 500 nucleotides and most preferably about 50 nucleotides. 

It will also be apparent that the matrices obtained from the methods of this invention 
are valuable products themselves. Of particular interest is a matrix substrate comprising a 
plurality of immobilized sets of oligonucleotide or polynucleotide probes of predetermined 
sequence, each bound set of oligonucleotide or polynucleotide probes being able to hybridize 



wo 99/35256 



PCT/IB99/00048 



II 



with a specific polynucleotide carried by the genome of the organism from which the 
polynucleotide coding for the first test polypeptide belongs; and at least one polynucleotide 
coding for one selected first test polypeptide being hybridized thereto. 

The DNA arrays used in the methods of the invention preferably contain 
oligonucleotide probes of between 10 and 100 nucleotides, and preferably between 10 and 
40 nucleotides, and cover the whole genome or part of the genome of interest. In one 
embodiment of the invention, the oligonucleotide probes immobilized onto the substrate 
matrix consist of Expressed Sequence Tags (ESTs). The DNA arrays of this invention may. 
alternatively, contain full length coding polynucleotides corresponding to every identified 
gene of the host organism under study. For example, when S. ccrevisea is the target host, 
a typical DNA array used in performing the screening methods of the invention may contairi 
6000 full length polynucleotides, each polynucleotide comprising the full length coding 
sequence of a gene among the 6000 genes identified for S. cerevisiae. 

Because the screening methods according to this invention make use of DNA probe 
arrays in order to identify the selected polynucleotides coding for the interactor polypeptides 
of interest, the methods are particularly well suited to polynucleotides derived from a host 
organism for which the whole genome has already been sequenced. However, the methods 
of this invention may also be applied to polynucleotides issued from a library generated from 
specific partially or totally sequenced chromosomes of complex host organisms, including 
humans. In one specific embodiment of the methods of this invention, the method is 
performed using, as a source of polynucleotide sequences to be tested, a library of randomly 
synthesized and identified polynucleotides. 

It will be readily apparent to those of skill in the art that application of the methods 
of this invention will lead to the identification of novel polynucleotides and their functions. 
These polynucleotides and the polypeptides encoded by these polynucleotides are within the 
scope of this invention. Of particular interest are peptides comprising a peptide domain that 
interacts with the second test polypeptide of interest. 

EXAMPLFS 

Preparation of oligonucleotide arrays 

Oligonucleotide an-ays containing over 65,000 DNA synthesis features were 
prepared using light-direaed, solid phase combinatorial chemistry as previously described 
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(6. 7). Each 50 X 50 um synthesis feature is comprised of more than lO' copies of a discrete 
25-mer oligonucleotide that is complementary to a portion of a yeast gene. The full set of 
oligonucleotides includes an average of twenty synthesis features for each of the 6,32 1 genes 
identified from the Saccharomyccs ccrcvisiac genome, These arrays were originally designed 
and used for the analysis of mRNA gene expression (Wodicka. L.. Dong. H.. Mittmann. M.. 
Ho, M. H., and Lockhart. D. J., Nat. Biotechnology, 1 197. 15, 1359-1367). 

Oligonucleotide arrays were first tested for the ability to identify specific gene 
fragmems. A fiuorescence image of an array following hybridization of eleven labeled PCR 
products reveals intense signals at discrete positions, with minimal background (Fig. 2a). 
Because the probes for a given gene are synthesized in adjacent positions, hybridization of 
PCR products is detected as horizontal rows of high Intensity (Fig. 2b). Signal corresponding 
to all eleven genes was detected in the correa locations. No significant signal was detected 
for any other genes in the genome. Each experiment was performed in duplicate, and 
hybridization results were found to be reproducible (data not shown). 

After a biological selection, library elements in high abundance can be identified by 
dideoxy sequencing. However, detection of rare elements might require the sequencing of 
thousands of clones. To determine the ability to detect very rare elements using array 
hybridization, the control PCR products were remade without the 600 bp YEL006c gene 
fi-agment, and known amounts of this sequence were added to the pool. Concentrations of 
spiked YEL006C DNA as low as 5 pM were detectable by hybridization. Therefore, array 
hybridization is sensitive to library elements that comprise less than 1:10.000 of the total 
pool. This is consistent with previous gene expression experiments in which rare mRNAs 
present at frequencies below 1:100,000 were detected quantitatively (7). 

Whole genome yeast arrays were then used to analyze DNA results fi-om two-hybrid 
screens for protein-protein interactions. Identification of proteins that physically interact 
within the cell can suggest how a gene produa participates in cellular processes (8-11). In 
the two-hybrid screen, two proteins are expressed in yeast as fusions to either the DNA- 
binding domain or the activation domain of a transcription faaor. Physical interaction of the 
two proteins reconstitutes transcriptional activity, turning on a chromosomal gene essential 
for survival under selective conditions (8). In screening for novel protein-protein interactions, 
yeast cells are first transformed with a plasmid encoding a specific DNA-binding fusion 
protein. A plasmid library of aaivation domain fusions derived fi-om genomic DNA is then 
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incroduced into ihese ceiis. Transcnpiional activation fusions found in cells which survive 
selective conditions are considered to encode peptide domains which may interact with the 
DNA-binding domain fusion protein. 
Librnry construction 

A large yeast genomic DNA library of 5 x 10* clones (designated the «FRYL» 
library) was made in £. coU MR32 strain according to a previously described procedure 
[Elledge et al. PNAS. USA, 88. 1731-1735 (1991)]. 

- Origin of the plasmid: pACTII (with minor modifications). 

- Origin of the genomic DNA: Ym955 (a gift of M. Johnston). 
Ym955 =ura3-52, his3-200. ade2-lCl. Iys2-801, leu2-3.1 12, trpl-901. tyrl- 
501, gal4-542, gal80-538. 

his3-200, trpl-901. gal4-542 and gal80-538 are deletions of all coding 
sequences. 

Genomic DNA was sonicated, blunted by 3 modification enzymes (Mung bean. T4 
1 5 DNA Polymerase and KJeenow). Adaptors were ligated to blunted ends. Adaptors were 
designed to allow blunt litigation at one extremity and cohesive ligation with a 3 nucleotide 
overhang at the other end. 

The sequence of adaptors was 5'-ATCCCGGACGAAGGCC (SEQ ID NO: 1) and 
5--GGCCTTCGTCCGG (SEQ ID NO: 2), and only the former was phosphorylated before 
20 annealing to avoid self-ligation of the adaptors. After ligation the inserts were purified fi-om 
free adaptors and small firagments on a Chroma Spin column (Clontech). 

The pACTn vector was digested with Bamm and the extremities were filled in with 
dGTP by the Vent (exo) polymerase (New England Biolabs), generating extremities 
complementary to the 3 nucleotide overhang of adaptors but preventing self-ligation of the 
25 vector. {BamlU. sites are reconstituted at each end of the insert). This strategy prevents self- 
ligation of the vector or ligation of multiple inserts. 

Inserts and vectors were ligated together and ligation products were used to 
transform £. coli MR32. 5x10* clones were obtained. All transformants were scraped fi-om 
dishes and the pool of transformants were fi-ozen in LB/glycerol. The titer of the library was 
30 1-2x10' transformants/ml. 
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EXAMPLF 1 

To demonstrate the analysis of a genetic selection using oligonucleotide arrays, a 
two-hybrid screen was conducted for the Saccharomyces ccrevisiaa gene YMRl 17c. 
YMRl I7c is a previously uncharactcrizcd ORF recently found by two-hybrid analysis to 
interact with the U2 snRNP-associatcd splicing factor, PrpI Ip (4). 
PInsniids nnd struitis 

For the YMRl 17c screen, the yeast strains used for two-hybrid screening were 
CG1945 and YI87 (Clontech). A pAS2AA bait vector was constructed from the pAS2 
piasmid (Clontech) by deletion of the CYH2 gene and the HA epitope. A bait plasmid was 
constmcted by PCR amplification of YMRll7c from genomic DNA and cloning into 
PAS2AA as a BamHI-Pst fragment. The bait plasmid was verified by sequencing after 
cloning, 

The polynucleotide insen containing the chimeric gene GAL4ATvlRl 17c consists of 
SEQ ID NO: 3. wherein nucleotides 1-441 correspond to the GAL4 DNA binding domain. 
The resulting encoded fusion polypeptide consists of SEQ ID NO: 4. wherein amino acids 
1-147 correspond to the GAL4 DNA binding domain and amino acids 148-378 correspond 
to the YMRl 17c peptide sequence. 
YMRl 17c Two-hybrid screen 

CGI 945 yeast cells were transformed with the bait vector and used in a mating 
strategy (4). Y187 cells were first transformed with DNA from the FRYL two-hybrid library, 
transformants were pooled, and aliquots of the cell suspension were frozen. The two strains 
were mixed, concentrated onto filters, and incubated on rich medium for 4.5 h at 30 »C. The 
cells were coUected. and a 10"^ dilution was spread on -L. -LW, and -W plates to score the 
number of parental cells and the number of diploids. The rest of the cell suspension was 
spread on -LWH plates and incubated for three days at 30 8.5 x lO' diploids were 
screened, and 5800 His* colonies were selected. 10 ml of an X-Gal mixture (0.5 % agar. 
0.1 % SDS, 6 % dimethylformamide. and 0.04 % X-Gal) was poured on the plates and the 
plates were incubated at 30 »C. Blue clones were checked after a 30 min to 1 8 h incubation 
and streaked on -LWH selective plates. 108 total clones were identified as positive by the 
X-Gal assay and processed as described below. 



wo 99/35256 



PCT/IB99/00048 



15 

PCR nmplificntioii and inbeliiig of DNA from pooled clones 

A volume of 200 ^l of a saturated culture (approximately I x lO' cells) of each of the 

1 08 positive two-hybrid clones from the YMRl 1 7c two-hybrid screen were pooled (Fig. I ) 
and DNA was isolated and purified as previously described (5). Primers containing vector 
sequence at the 3' end were used to PCR amplify gene mscns from tiic piasmid mixture. 
Specifically, using the vector-based primers T7F0R (5'GAATTGTAATACGA 
CTCACTATAGGGAGGTGATGAAG ATACCCCACC-3') (SEQ ID NO: 5) and T3REV 

(AGATGCAATTAACCCTCACTAAAGGGAGACGGGGTTTTTCAGTATCTAC 
GATTC-3') (SEQ ID NO: 6), all library inserts were PCR amplified in a single reaction. The 
50 |il PCR reaction contained: 2.5 U of Taq DNA polymerase. 10 mM Tris (pH 8.5), 50 
mM KCl, 1.5 mM MgClj, 0.2 jiM each primer, and 250 pM each dNTP. Conditions used 
for amplificauon were as follows: 30 cycles at 96°C for 30 s, 62°C for 30 s. 72"'C for 2 min. 
Reaction products were purified in a Qiaquick spin column (Qiagen). 1 jxg total PCR product 
was fragmented with 0.1 U DNAse I (amplification grade. GibcoBRL) for 2 min in 35 ^l 
containing: 10 mM Tris-acetate (pH 7.5), 10 mM magnesium acetate. 50 mM potassium 
acetate, and 1 5 mM CoCl. The DNAse I reaction was then boiled for 1 5 min, chilled on ice. 
and incubated with 1 mmole biotin-ddATP (NEN) and 25 U terminal transferase (Boehringer 
Mannheim) for 1 hour at 37°C. SSPE-T hybridization buffer (0.9 M NaCl, 60 mM NaHjPOj. 
6 mM EDTA, 0.005 % Triton-X-100) was added to a final volume of 200 ^il. 
Generation of cDNA product from PCR product 

RNA was transcribed from 240 ng of purified PCR product using T7 polymerase 
(Ambion). The reaction was incubated an additional hour with 20 U DNAse I. RNA was 
purified using an RNA spin column (Qiagen). 2.0 ng of RNA was used for first strand cDNA 
synthesis (Promega). Reaction products were purified in a Qiaquick spin column (Qiagen). 
and 1 ng total PCR product was digested and prepared for hybridization as described above. 
Hybridization of DNA to the high-density oligonucleotide array 

DNA products generated from the library piasmid pool were partially DNAse I 
digested, biotinylated, and hybridized to whole genome arrays (Fig. 3). Specifically, arrays 
were prewashed with hybridization buffer (described above) 5 min prior to sample 
hybridization. Following a 5 min incubation at 99''C, the sample was chiUed on ice, allowed 
to return to room temperature, and applied to the array. After a 12 hour hybridization at 
42''C. the array was washed 10 times with 6X SSPE-T, washed with 0.5x SSPE-T for 15 
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min, and stained with a streptavidin-phycoerythrin conjugate (Molecular Probes) for 10 min, 
all at 42»C. The staining buffer contained 6X SSPET. 0.5 mg/ml bovine serum albumin, and 
1 mg/ml streptavidin-phycoerythrin. The array was washed 5 times with 6X SSPE-T prior 
to scanning. Hybridization patterns were detected by using an argon ion laser to excite 
phycocrythrin; the resulting emission was actecrcd using a photomuiiiplicr tube through a 
560 nm bandpass filter (Molecular Dynamics). The entire array was read at a resolution of 
7.5 nm in less than 20 min. generating quantitative signal for each probe element. The 
collected data was analyzed using image and data analysis software (Afiymeirix). 

Orientauon of genes was determined by hybridization of biotinylated cDNA products. 
All genes identified by array hybridization are listed in Table 1. 
Criteria for gene detection 

On chips A, B. C. and D, which contain an average of 20 probes per gene, the 
presence of a gene fi-agment was determined by visual and quantitative detection of three 
contiguous positive probes. On the E chip, which contains probes for 5' sequence from genes 
which are longer than Ikb. detection of two contiguous positive probes was considered 
sufficient to detect a gene fragment. 
Comparison of hybridization and sequencing results 

Library plasmid inserts were amplified by PGR and the insen junctions with the 
GAL4 domain were sequenced and precisely identified in the yeast genome using the BLAST 
program, the Saccharomyces Genome Database, and the Yeast Protein Database. In parallel, 
clones were used to inoculate 200 nl cultures. Saturated cultures were pooled and processed 
as previously described. 

The hybridization results from the YMR117C screen were compared to results 
obtained by dideoxy sequencing of alJ 108 DNA clones. Nineteen of twenty-two independent 
loci were identified by hybridization, with no false positives. Based on analysis of the 
hybridizing array elements, we were also able to identify the region of the gene present in 
each insert (Table 2). 

The three loci that were not detected by array hybridization were either not 
represented on the array or were resistant to PGR amplification. One of the undetected 
inserts, YLR276c. was difficult to amplify by PGR and could only be sequenc:ed after plasmid 
rescue. The other two undeterted inserts start within two hundred bases upstream of the 3' 
end of the gene, in region only covered by one or no probes. Therefore, the signal for these 
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genes was not recognized as significant beca-^se there was not a consistent pattern of 
hybridization extending across multiple probes. 
EXAMPLE 2 

To further demonstrate this method, a two-hybrid screen for the gene YMR]38w 
was also carried out and analyzed by array hybridization. YMRI38w (C1N4) is a gene in 
which mutations cause supersensitivity to the antimicrotubulc dmg bcnomyl, as well as 
increased rates of chromosome loss (12). YMR138w is homologous to the ARFl-class of 
small GTP-binding proteins, but a distinct role in microtubule function is not yet known. The 
complete results for this screen are listed in Table I . 
Plasmids and strains 

For the YMR138w screen, the yeast strains used were the Y 1 90 and Y 1 87 cyh2^ 
marked derivatives of Y159 and Y153. respectively. The library was a yeast cDNA library 
fused to the transcriptional activation domain of GAL4 (gift of S. Eiledge. Baylor College 
of Medicine). The bait vector pTS434 was constructed by cloning CIN4 into pASl-CYH2 
(Clontech) as a NcoI-BamHI fragment. 
YMR138W Two-hybrid screen 

Y190 containing pTS434 was transformed with cDNA library using a lithium acetate- 
based protocol. 5 X 10* transformants were screened by plating on -Ade selective media, and 
1 14 colonies Ade* were selected. All 114 colonies were patched onto +Ade plates and lifted 
onto BA85 nitroceUulose filters (Schleicher and Schuell) and immersed in liquid nitrogen for 
10s. The filters were then soaked with 3 mis of Z buffer (60 mM Na2HP0., 40 mM NaHj 
PO4, 10 mM KCl, ImM MgS04. and 50 mM (J-mercaptoethanol; pH 7.0) containing 0.05 % 
X-Gal. Filters were incubated at 30''C for 6 h and scored for the development of blue color. 
86 clones were positive by a lacZ fUter assay. All 86 clones passed testing for solo activation 
by streaking strain Y190 carrying the library isolate and pTS434 on -L plates plus 5 ng/ml 
cycloheximide. The strains were confirmed to have lost the TRP-containing plasmid by 
failure to grow on -W media. 81 clones passed testing for specificity by mating strain Y190 
carrying library plasmids with Y187 carrying the negative controls pAS-CDK2. pASlO- 
lamin. pASl-p53, and pASl-rev (a gift of D. Amberg). Library plasmid inserts were 
amplified by PGR and the insert junctions with the GAL4 domain were sequenced and 
precisely identified in the yeast genome using the BLAST program, the Saccharomyces 
Genome Database (http://genome-www.stanford.edu) and the Yeast Protein Database 
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(http://www.proteome.com). In parallel, clones were used to inoculate 200 pi cultures. 

Saturated cultures were collected, pooled, and processed as previously described. 

HybridizntioiJ of DNA to tiic liigli-dciisity oligonucleotide array 

DNA products generated from the library plasmid pool were partially DNAsc 1 

digested, biotinylated. and hybridized to whole genome arrays. Specifically, arrays were 

prewashed with hybridization bufler (described above) 5 min prior to sample hybridization. 

Following a 5 min incubation at 99"C. the sample was chilled on ice. allowed to return to 

room temperature, and applied to the array. After a 12 hour hybridization at 42°C, the array 
was washed 10 times with 6x SSPE-T. washed with 0.5x SSPE-T for 15 min, and stained 
with a streptavidin-phycoeiythrin conjugate (Molecular Probes) for 10 min, all at 42»C. The 
staining buffer contained 6x SSPET. 0.5 mg/ml bovine serum albumin, and 1 mg/ml 
streptavidin-phycoerythrin. The airay was washed 5 times with 6x SSPE-T prior to scanning. 
Hybridization patterns were detected by using an argon ion laser to excite phycoerythrin; the 
resulting emission was detected using a photomultiplier tube through a 560 nm bandpass 
filter (Molecular Dynamics). The entire array was read at a resolution of 7.5 ^m in less than 
20 min. generating quantitative signal for each probe element. The collected data was 
analyzed using image and data analysis software (Afiymetrix). 

Orientation of genes was determined by hybridization of biotinylated cDNA products. 
All genes identified by array hybridization are listed in Table I. 
Conclusion 

Both two-hybrid screens identified interactors consistent with known results for each 
gene. The previously detected interaction of YMRllTc with Prpllp splicing factor has 
suggested that YMRI 17c could have a fiinctionaJ connection with the U2snRNP (4). Several 
of the interactors found in this screen also have known associations vnth the U2snRNP. For 
example. Yml049c has previously been found to interact with the ?rp9p splicing factor (4). 
Like CrN4, YPL241c (CIN2) was first isolated as a mutation displaying supersensitivity to 
antimicrotubule agents (12). Mutations in both CIN2 and CIN4 have already been shown to 
be epistatic to mutations in CINl, a gene implicated in the post-chaperonin folding of yeast 
tubulin (13). However,, these results are the first evidence for a physical interaction between 
CIN2 and CIN4 and suggest that they may act as a complex to regulate specific protein- 
folding pathways. Further investigations are needed to establish the biological significance 
of interactions fi-om both screens. 
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Tnble ! 



Ycnsf ORFs idcntinf d bv arrnv anntvsis of two-hvhriri 



screens 



10 



15 



20 

Non-protein 
encoding DNA 
Reverse 
25 orientation 



YMRH7r 

YBR020w(GALI) 

YCL032W (STE50) 

YCR073C (SSK22) 

YDR104C 

YER018C 

YER032w(FIRl) 

YFR046C 

YGL197W 

YIL144W 

YLR319C (BUD6) 

YLR4i9w 

YML049C 

YMR224C (MREll) 

YOLISc 

YOL34W 

YOR206W 

YPROlOc CRPA135) 

YPR145W (ASNl) 



YMR138wf(;-r^4) 

YDLIlTw 

YDR087C 

YGLI72W (NUP49) 
YHR141c(MAK18) 
YLRI09W 

YNROSOc (LYS9) 
YPL241c(CrN2) 



YNL291C 



18s and 25s rRNA 

YBR189W 
YDR381W 
YNL301c(RP28B) 
YNR035C 
YOL0S6W (GPM-j) 

30 ORF loci and names are listed for genes detected by array hybridization of PGR products 
derived from end-products of a two-hybrid screen. Because inserts in the non-coding 
orientation comprise a significant proportion of false positives in the two-hybrid screen. 
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UNA was transcribed from the upstream T7 promoter and used to generate exclusively 
antisense cDNA strands witli reverse transcriptase. cDNA products were then biotinylated, 
fragmented, and hybridized as described. Genes detected by double stranded DNA 
hybridization but absent in cDNA hybridization arc considered to be in reverse orientation. 
Control experiments were performed to contlrm that this method is orientation-specific (data 
not shown). 





Tnbic 2 


V omnArivfin rtf cf*n 


tieiicine nnd hvbridizntion for cinnr f »»nHc 


10 


ORF name 


ORF size (nt^ 


5' end bv sequencino 


5' end array prQt>e 




YBR020W 


1584 


1151 


1164 




YCL032W 


1038 


" 131 


168 




YDR104C 


3735 


3230 


3234 




YER032W 


2775 


1808 


1860 


15 


YFR046C 


1083 


4 


114 




YGL197W 


4461 


3974 


4092 




YML049C 


4083 


2597 


2616 




YMR224C 


2076 


531 


566 




YOLO 18c 


1191 


257 


324 


20 


YOL034W 


3279 


620 


669 



ORP name, ORF size, and the 5' ends of identified genes, determined either by sequencing 
or array hybridization, for 10 clones from the YMRllVc screen. For genes sequenced 
multiple times as different inserts, the end of the most 5' clone is listed. The 5* end as 

25 detected by array hybridization indicates the most 5* nucleotide of the most 5' probe detected 
as positive. Small disparities between sequencing and hybridization are the result of insert 
5' ends falling in between probes on the array. Although array hybridization does not confirm 
that inserts are in frame with respect to the start codon, previous work has shown that 
frameshifling events generally lead to production of protein regardless of the precise fusion 

30 junction between gene insert and transcriptional activation domain (11). 
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CLAIMS 

1 A method for identification of a polynucleotide comprising the steps of: 

a) subjecting a polynucleotide of interest to a two-hybrid screening method; 

b) subjecting the polynucleotides selected at step a) to a hybridization 
reaction onto a matrix substrate onto which oiigonucicoiidc or polynucleotide probes nave 
been immobilized. 

2. A method for identifying a polynucleotide encoding a first polypeptide, said first 
polypeptide being able to interact with a second polypeptide of interest, comprising the steps 
of: 

a) providing a recombinant host cell containing a detectable gene, wherein 
the detectable gene expresses a detectable polypeptide when the detectable gene is activated 
by an amino acid sequence including a transcriptional activation domain; 

b) providing a first chimeric gene that is capable of being expressed in the 
host cell, the first chimeric gene comprising a DNA sequence that encodes a first hybrid 

15 polypeptide encoded by a given prokaryotic or eukaryotic organism, said first hybrid 
polypeptide comprising: 

(i) the transcriptional activation domain; and 

(ii) a first test polypeptide that is to be tested for interaction with the 
second test polypeptide; 

20 c) providing a second chimeric gene that is capable of being expressed in the 

host cell, the second chimeric gene comprising a DNA sequence that encodes a second 
hybrid polypeptide, the second hybrid polypeptide comprising: 

(i) a DNA-binding domain that recognizes a binding site on the 
detectable gene in the host cell; and 

(ii) a second test polypeptide that is to be tested for interaction with at 
least one first test polypeptide; 

wherein interaction between the first test polypeptide and the second test 
polypeptide in the host cell causes the transcriptional activation domain to activate 
transcription of the detectable gene; 

d) introducing the first chimeric gene and the second chimeric gene into the 

host cell; 



25 
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e) subjecting the host ceil to conditions under which the first hybrid 
polypeptide and the second hybrid polypeptide are expressed in sufficient quantity for the 
detectable gene to be activated; 

f) selecting the host cell clones for which the detectable gene has been 
expressed to a oegrce greater than expression ir. tiic absence of interaction between ihc first 
test polypeptide and the second test polypeptide; 

g) optionally pooling the clones that have been positively selected at step 0 

h) amplifying the polynucleotides of interest contained in the clones of step 
0 or g) with a pair of oligonucleotide primers respectively hybridizing with a plasmid 
sequence located at the 5' end of the polynucleotide of interest and with a sequence 
complementary to a plasmid sequence located at the 3' end of the polynucleotide of interest 
coding for the first polypeptide; 

i) hybridizing the amplified polynucleotides obtained at step h) to a matrix 
substrate on which has been bound, at known locations, a plurality of sets of oligonucleotide 

1 5 or polynucleotide probes of predetermined sequence, each bound set of oligonucleotide or 
polynucleotide probes being able to hybridize with a specific polynucleotide carried by the 
genome of the organism from which the polynucleotide coding for the first test polypeptide 
belongs; ' 

j) detecting the locations of the polynucleotide hybrid complexes obtained 
20 at step i) on the matrix substrate; 

k) optionally determining the quantity of each hybrid complex detected at 

stepj). 

3. A method for identifying a polynucleotide encoding a first polypeptide that 
inhibits the interaction between a second polypeptide and a third polypeptide comprising the 
25 steps of: 

a) providing a recombinant host cell containing a detectable gene wherein 
the detectable gene expresses a detectable polypeptide when the deteaable gene is activated 
by an amino acid sequence including a transcriptional activation domain; 

b) providing a first gene that is capable of being expressed in the host cell, 
30 said first gene comprising a DNA sequence that encodes a first polypeptide encoded by a 

given prokaryotic or eukaryotic organism, and for which its inhibition properties on the 
interaction between a second and a third polypeptide is tested; 
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c) providing a second chimenc gene that is capable of being expressed in 
host cell, the second chimeric gene comprising a DNA sequence that encodes a second 
hybrid polypeptide encoded by a given prokaryotic or eukaryotic organism, said second 
hybrid polypeptide comprising: 

(i) the transcriptional activation domain; and 

(ii) a second test polypeptide that interacts with a third polypeptide; 

d) providing a third chimeric gene that is capable of being expressed in the 
host cell, the third chimeric gene comprising a DNA sequence that encodes a third hybrid 
polypeptide, the third hybrid polypeptide comprising: 

(i) a DNA-binding domain that recognizes a binding site on the 
detectable gene in the host cell; and 

(ii) a third test polypeptide that interacts with the second test 
polypeptide; 

wherein interaction between the second test polypeptide and the third test polypeptide in the 
host cell causes the transcriptional activation domain to activate transcription of the 
detectable gene; 

e) introducing the first gene, the second chimeric gene and the third 
chimeric gene into the host cell; 

f) subjecting the host cell to conditions under which the second hybrid 
polypeptide and the third polypeptide are expressed in sufficient quantity for the detectable 
gene to be activated; 

g) selecting the host cell clones for which the detectable gene has been 
expressed to a degree lesser than its expression level in the absence of expression of the first 
polypeptide; 

h) optionally pooling the clones that have been positively selected at step g); 

i) amplifying the polynucleotides of interest contained in the clones of step 
g) or h) with a pair of oligonucleotide primers respectively hybridizing with a plasmid 
sequence located at the 5' end of the polynucleotide of interest and with a sequence 
complementary to a plasmid sequence located at the 3* end of the polynucleotide of interest 
coding for the first polypeptide; 

j) hybridizing the amplified polynucleotides obtained at step i) to a matrix 
substrate on which has been bound, at known locations, a plurality of sets of oligonucleotide 



wo 99/35256 



PCT/IB99/00048 



26 



or poiynucieotide probes o: predetermined sequence, each bound set of oiigonucieotiae or 
polynucleotide probes being able to hybridize with a specific polynucleotide carried by the 
genome of the organism from which the polynucleotide coding for the Hrst test polypeptide 
belongs; 

5 k) detecting the locations of the polynucleotide hybrid complexes obtained 

at step j) on the matrix substrate; 

I) optionally determining the quantity of each hybrid complex detected at 

step i). 

4. The method according to claim 2 or 3, wherein 

1 0 a) some of the polynucleotides obtained at step 0 or g) of claim 2 or at step 

g) or h) of claim 3 are separated and subjected to a DNA amplification reaction with a pair 
of primers wherein at least one of the primers comprises, at its 5' end. a promoter region 
recognized by a specific RNA polymerase; 

b) the resulung amplified polynudeoiides of the above step a) are incubated 
1 5 in the presence of the corresponding RNA polymerase in an acellular enzyme medium; 

c) the mRNA obtained at the above step b) is incubated in the presence of 
a reverse transcriptase type enzyme; 

d) the cDNA molecule obtained at the above step c) is hybridized to a 
matrix substrate on which has been bound, at known locations, a plurality of sets of 

20 oligonucleotides of predetermined sequence, each bound set of oligonucleotide being able 
to hybridize with a specific polynucleotide carried by the genome of the organism from 
which the polynucleotide coding for the first test polypeptide belongs; and 

e) the locations of the polynucleotide hybrid complexes obtained at step d) 
on the matrix substrate are determined and compared with the results obtained from method 

25 ofclaim2orclaim3. 

5. The method according to any one of claims 1 to 3, wherein the transcriptional 
activator is from GAM. 

6. The method according to claim 4. wherein the promoter region contained in the 
primer used at step a) is the bacteriophage T7 promoter region and the RNA polymerase 

30 used at step b) is the bacteriophage T7 polymerase. 

7. The method according to any one of claims 1 to 3 wherein the part of the first 
chimeric gene coding for the first test polypeptide is provided by a DNA library. 
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8. The method according to ciaim 7 wherein the DNA library has been prepared 
from tlie genome or from the mRNA of a prokaryotic host. 

9. The method according to ciaim 7 wherein the DNA library has been prepared 
from the genome or from the mRNA of a eukaryotic host. 

5 1 0. The method according to claim 1 0 wherein the DNA library has been prepared 

from the genomic DNA of Saccharomyccx ccrevisiaa. 

11. The method according to any one of claims I to 3. wherein the sets of 
oligonucleotide or polynucleotide probes bound to the substrate matrix are designed in such 
a manner that every region of the whole genome of the prokaryotic or eukaryotic host 

1 0 organism is able to specifically hybridize to at least one of said set of oligonucleotide or 
polynucleotide probes. 

12. The method according to claim 11. wherein two sets of oligonucleotide or 
polynucleotide probes bound to the matrix substrate are complementary to adjacem 
sequences in the genome of the prokaryotic or eukaryotic host distant one from each other 

1 5 of less than one kilobase. 

13. The method according to claim 12, wherein two sets of oligonucleotide or 
polynucleotide probes bound to the matrix substrate are complementary to adjacent 
sequences in the genome of the host, such that the distance between the sequences is less 
than 500 bases. 

14. The method according to claim 13, wherein two sets of oligonucleotide or 
polynucleotide probes bound to the matrix substrate are complementary to adjacent 
sequences in the genome of the host, such that the distance between the sequences is about 
SO bases. 

15. A polynucleotide molecule that has been obtained with the method according 
25 to any one of claims 1 to 3. 

1 6. A polypeptide that is encoded by a polynucleotide according to claim 15. 

1 7. A polypeptide that has been obtained with the method according to any one of 
claims 1 to 3. 

18. A peptide comprising a peptide domain interacting with the second test 
30 polypeptide of interest. 

19. A matrix substrate on which has been bound, at known locations, a plurality of 
sets of oligonucleotide or polynucleotide probes of predetermined sequence, each bound set 
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of oiigonucleotide or poiynucieotide probes being able to hybridize with a specific 
polynucleotide carried by the genome of the organism from which the poiyniicieotide coding 
for the first test polypeptide belongs. 

20. A matrix substrate comprising: 

5 a) a plurality of immobilized sets of oligonucleotide or polynucleotide 

probes of predetermined sequence, each bound set of oligonucleotide or polynucleotide 
probes being able to hybridize with a specific polynucleotide carried by the genome of the 
organism from which the polynucleotide coding for the first test polypeptide belongs; 

b) at least one polynucleotide coding for one selected first test polypeptide 
10 being hybridized thereto. 

2 1 . A computer useable medium containing computer readable data related to the 
hybrid complexes formed within a matrix substrate according to claim 19 or claim 20. 
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FIGURE 3 
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SEQUENCE LISTING 

<110> INSTITUT PASTEUR 

STANFORD UNIVERSITY 
AFFYMETRIX 

<120> SCREENING INTERACTOR MOLECULES WITH WHOLE GENOME 
OLIGONUCLEOTIDE OR POLYNUCLEOTIDE ARRAYS 

<130> 03495-0160-01000 

<140> 

<141> 

<150> US 09/003,335 and US 09/154,972 
<151> 1998-01-06 and 1998-09-17 
<160> 6 

<170> Patentin Ver. 2.0 
<210> 1 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: adaptor 
<400> 1 

atcccggacg aaggcc 16 
<210> 2 
<211> 13 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: adaptor 
<400> 2 

ggccttcgtc egg 13 
<210> 3 
<211> 1134 
<212> DNA 
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<213> Saccharorr.yces cerevisiae 
<400> 3 



atgaagctac 


tgtcttctat 


cgaacaagca 


tgcgatattt 


gccgacttaa 


aaagctcaag 


60 


tgctccaaag 


aaaaaccgaa 


gtgcgccaag 


tgtctgaaga 


acaactggga 


gcg tcgctac 


120 


tcccccaaaa 


cc«-^a;iaggt:c 


cccgctgact 


agggcacatc 


tgacagaagt: 


ggaa tcaagg 


180 


ctagaaagac 


tggaacagct 


atttctactg 


atttttcctc 


gagaagacct 


tgacatgaut 


240 


ttgaaaatgg 


actctttaca 


ggatataaaa 


gcattgttaa 


caggactatt 


eg tacaagat 


300 


aacgtgaata 


aagatgccgt 


cacagataga 


ttggcttcag 


tggagactga 


tatgcctcta 


360 


acattgagac 


agcatagaat 


aagtgcgaca 


tcatcatcgg 


aagagagtag 


taacaaaggt 


420 


caaagacagt 


tgactgtatc 


gccggaattt 


atggccatgg 


aggccccggg 


gatccgaagg 


480 


aacgcaagag 


ccatgtcaca 


aaaggataac 


ctactcgaca 


atccggttga 


atttttaaaa 


540 


gaggtcagag 


aaagttttga 


tattcagcaa 


gatgttgatg 


ccatgaaaag 


aatccgacac 


600 


gatcttgatg 


ttataaaaga 


ggaaagcgaa 


gcaagaatta 


gtaaagagca 


ttcaaaggtt 


660 


tctgagtcga 


acaagaaatt 


gaatgcggaa 


agaataaatg 


ttgctaaatt 


ggagggagac 


720 


ttagaatata 


ctaacgaaga 


gagcaatgag 


tttggtagta 


aagacgaact 


agttaaactt 


780 


ctgaaagatt 


tggacggatt 


ggaacgtaat 


attgtgtcac 


ttcgaagtga 


attggacgaa 


840 


aagatgaaat 


tgtacctcaa 


agatagtgaa 


ataatatcca 


caccgaacgg 


ttccaaaata 


900 


aaagcaaaag 


taattgaacc 


tgagctggaa 


gaacaaagtg 


egg tcacccc 


ggaagcaaac 


960 


gaaaatattc 


taaaattgaa 


gctatacaga 


tctttaggag 


ttattrtgga 


tttagaaaat 


1020 


gatcaagtcc 


ttattaacag 


aaaaaatgat 


gggaatattg 


atattttacc 


cttggacaat 


1080 


aacctcagcg 


atttctataa 


gaccaaatac 


atctgggaaa 


gattaggaaa 


gtga 


1134 



<210> 4 
<211> 378 
<212> PRT 

<213> Saccharomyces cerevisiae 
<400> 4 

Met Lys Leu Leu Ser Ser lie Glu Gin Ala Cys Aso lie Cys Arg Leu 
1 5 10 * 15 

Lys Lys Leu Lys Cys Ser Lys Glu Lys Pro Lys Cys Ala Lys Cys Leu 
20 25 30 

Lys Asn Asn Trp Glu Cys Arg Tyr Ser Pro Lys Thr Lys Arg Ser Pro 
35 40 45 
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Leu T^z Arg Al 
50 



a His Leu 



55 




60 



Glu A r z 



Leu 



Glu Gin Leu Phe Leu Leu He Phe Pro Arg Glu Asp Leu Asp Met He 
65 70 75 80 

Leu Lys Met: Asp Ser Leu Gin Asp He Lys Ala Leu Leu Thr Glv Leu 

85 90 95 

Phe Val Gin Asp Asn Val Asn Lys Asp Ala Val Thr Asp Arg Leu Ala 
100 105 110 

Ser Val Glu Thr Asp Met Pro Leu Thr Leu Arg Gin His Arg He Ser 
115 120 125 

Ala Thr Ser Ser Ser Glu Glu Ser Ser Asn Lys Gly Gin Arg Gin Leu 
130 135 140 

Thr Val Ser Pro Glu Phe Met Ala Met Glu Ala Pro Gly He Arg Arg 
145 150 155 160 

Asn Ala Arg Ala Met Ser Gin Lys Asd Asn Leu Leu Asp Asn Pro Val 
165 ^ 170 175 

Glu Phe Leu Lys Glu Val Arg Glu Ser Phe Asp He Gin Gin Asp Val 
180 185 190 

Asp Ala Met Lys Arg He Arg His Asp Leu Asp Val He Lys Glu Glu 
195 200 205 

Ser Glu Ala Arg He Ser Lys Glu His Ser Lys Val Ser Glu Ser Asn 
210 215 220 

Lys Lys Leu Asn Ala Glu Arg He Asn Val Ala Lys Leu Glu Gly Asp 
225 230 235 240 

Leu Glu Tyr Thr Asn Glu Glu Ser Asn Glu Phe Gly Ser Lys Asp Glu 
245 250 255 

Leu Val Lys Leu Leu Lys Asp Leu Asp Gly Leu Glu Arg Asn He Val 
260 265 270 

Ser Leu Arg Ser Glu Leu Asp Glu Lys Met Lys Leu Tyr Leu Lys Asp 

275 280 285 

Ser Glu He He Ser Thr Pro Asn Gly Ser Lys He Lys Ala Lys Val 

290 295 300 

He Glu Pro Glu Leu Glu Glu Gin Ser Ala Val Thr Pro Glu Ala Asn 
305 310 315 320 

Glu Asn He Leu Lys Leu Lys Leu Tyr Arg Ser Leu Gly Val He Leu 
325 330 335 

Asp Leu Glu Asn Asp Gin Val Leu He Asn Arg Lys Asn Asp Gly Asn 
340 345 350 

He Asp He Leu Pro Leu Asp Asn Asn Leu Ser Asp Phe Tyr Lys Thr 
355 360 365 

Lys Tyr He Trp Glu Arg Leu Gly Lvs Glx 
370 375 
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<210> 5 
<211> 47 
<212> DNA 

<213> Saccharomyces cerevisiae 
<400> 5 

gaattgtaat acgactcacr acagggaggt gatgaagata ccccacc 
<210> € 
<211> 54 
<212> DNA 

<213> Saccharomyces cerevisiae 
<400> 6 

agatgcaatt aaccctcact aaagggagac ggggtttttc agtatctacg attc 
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